MPC: A Unified Parallel Runtime for Clusters of NUMA Machines

نویسندگان

Marc Pérache

Hervé Jourdren

Raymond Namyst

چکیده

Over the last decade, Message Passing Interface (MPI) has become a very successful parallel programming environment for distributed memory architectures such as clusters. However, the architecture of cluster node is currently evolving from small symmetric shared memory multiprocessors towards massively multicore, Non-Uniform Memory Access (NUMA) hardware. Although regular MPI implementations are using numerous optimizations to realize zero copy cacheoblivious data transfers within shared-memory nodes, they might prevent applications from achieving most of the hardware’s performance simply because the scheduling of heavyweight processes is not flexible enough to dynamically fit the underlying hardware topology. This explains why several research efforts have investigated hybrid approaches mixing message passing between nodes and memory sharing inside nodes, such as MPI+OpenMP solutions [1,2]. However, these approaches require lots of programming efforts in order to adapt/rewrite existing MPI applications. In this paper, we present the MultiProcessor Communications environnement (MPC), which aims at providing programmers with an efficient runtime system for their existing MPI, POSIX Thread or hybrid MPI+Thread applications. The key idea is to use user-level threads instead of processes over multiprocessor cluster nodes to increase scheduling flexibility, to better control memory allocations and optimize scheduling of the communication flows with other nodes. Most existing MPI applications can run over MPC with no modification. We obtained substantial gains (up to 20%) by using MPC instead of a regular MPI runtime on several scientific applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Whatever-Scale Abstractions for Data-Driven Parallelism

Increasing diversity in computing systems often requires problems to be solved in quite different ways depending on the workload, data size, and resources available. This diversity is increasingly broad in terms of the organization, communication mechanisms, and performance and cost characteristics of individual machines and clusters. Researchers have thus been motivated to design abstractions ...

متن کامل

Scheduling Dynamic OpenMP Applications over Multicore Architectures

Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA penalties. While it is acknowledged that OpenMP can enhance the quality of thread scheduling on such architectures in a portable way, by transmitting precious information about...

متن کامل

Improving Parallel System Performance with a NUMA-aware Load Balancer

Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high performance computing. On such NUMA nodes, the shared memory is physically distributed into memory banks connected by a network. Owing to this, memory access costs may vary depending on the distance between the processing unit and the memory bank. Therefore, a key element in improving the performance o...

متن کامل

Volume Driven Data Distribution for NUMA-Machines

Highly scalable parallel computers, e.g. SCI-coupled workstation clusters, are NUMA architectures. Thus good static locality is essential for high performance and scalability of parallel programs on these machines. This paper describes novel techniques to optimize static locality at compilation time by application of data transformations and data distributions. The metric which guides the optim...

متن کامل

Meta Process Model and its Portable Parallel Programming Interface MpC

This paper proposes a new portable parallel programming interface MpC, Meta process C, for Meta Process Model. The Meta Process Model is a parallel programming padadigm based on a hierarchical shared memory model and an explicit description of parallelism On these points, this model is different from either the strict Shared Memory Model (SMM) or the Message Passing Model (MPM). The Meta Proces...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

MPC: A Unified Parallel Runtime for Clusters of NUMA Machines

نویسندگان

چکیده

منابع مشابه

Towards Whatever-Scale Abstractions for Data-Driven Parallelism

Scheduling Dynamic OpenMP Applications over Multicore Architectures

Improving Parallel System Performance with a NUMA-aware Load Balancer

Volume Driven Data Distribution for NUMA-Machines

Meta Process Model and its Portable Parallel Programming Interface MpC

عنوان ژورنال:

اشتراک گذاری